Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Authors

  • Mahshid Bahrami Department of Radiology, Isfahan University of Medical Sciences, Isfahan, Iran
  • Morteza Zangeneh Soroush Department of Biomedical Engineering, Science and Research branch, Islamic Azad University, Tehran, Iran
  • Razieh Sheikhpour Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran
  • Sanaz Mehrabani Non-Communicable Pediatric Diseases Research Center, Health Research Institute, Babol University of Medical Sciences, Babol, Iran
Abstract:

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray gene expression data of 72 patients with acute myeloid leukemia (AML) and lymphoblastic leukemia (ALL) was used. To remove the redundant genes and identify the most important genes in the prediction of AML and ALL, a robust ℓ2,p-norm (0 < p ≤1) sparsity-based gene selection method was applied, in which the parameter p method was implemented from 1/4, 1/2, 3/4 and 1. Then, the most important genes were used by the random forest (RF) and support vector machine (SVM) classifiers for prediction of AML and ALL. Results: The RF and SVM classifiers correctly classified all AML and ALL samples. The RF classifier obtained the performance of 100% using 10 genes selected by the ℓ2,1/2-norm and ℓ2,1-norm sparsity-based gene selection methods. Moreover, the SVM classifier obtained a performance of 100% using 10 genes selected by the ℓ2,1/2-norm method. Seven common genes were identified by all four values of parameter p in the ℓ2,p-norm method as the most important genes in the classification of AML and ALL, and the gene with the description “PRTN3 Proteinase 3 (serine proteinase, neutrophil, Wegener granulomatosis autoantigen” was identified as the most important gene. Conclusion: The results obtained in this study indicated that the prediction of blood cancer from leukemia microarray gene expression data can be carried out using the robust ℓ2,p-norm sparsity-based gene selection method and classification algorithms. It can be useful to examine the expression level of the genes identified by this study to predict leukemia.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

full text

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

full text

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

full text

Selection of more than one gene at a time for cancer prediction from gene expression data

A new gene selection method capable of selecting more than one gene at a time is introduced. This characteristic contrasts it with almost all known methods assuming that there are no interactions between genes. The only exception is the pairwise gene selection method recently proposed by Bø and Jonassen [3]. Motivated by this method, we compare it and ours. Classification into healthy tissue an...

full text

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

full text

Feature Selection for Cancer Classification Using Microarray Gene Expression Data

The DNA microarray technology enables us to measure the expression levels of thousands of genes simultaneously, providing great chance for cancer diagnosis and prognosis. The number of genes often exceeds tens of thousands, whereas the number of subjects available is often no more than a hundred. Therefore, it is necessary and important to perform gene selection for classification purpose. A go...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 13  issue 1

pages  13- 21

publication date 2023-01

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023